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ABSTRACT The quantification of spontaneous mutation rates is crucial for a mechanistic understanding of 
the evolutionary process. In bacteria, traditional estimates using experimental or comparative genetic 
methods are prone to statistical uncertainty and consequently estimates vary by over one order of 
magnitude. With the advent of next-generation sequencing, more accurate estimates are now possible. We 
sequenced 19 Escherichia coli genomes from a 40,000-generation evolution experiment and directly 
inferred the point-mutation rate based on the accumulation of synonymous substitutions. The resulting 
estimate was 8.9 x 10~ 11 per base-pair per generation, and there was a significant bias toward increased 
AT-content. We also compared our results with published genome sequence datasets for other bacterial 
evolution experiments. Given the power of our approach, our estimate represents the most accurate 
measure of bacterial base-substitution rates available to date. 
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Mutations and genetic recombination provide the variation that fuels 
adaptation. Knowledge of mutation rates is therefore an important 
component of a quantitative evolutionary theory (Lynch 2010). In 
bacteria, spontaneous base-substitution rates have been estimated by 
Luria-Delbriick fluctuation tests using selective conditions (Drake 
1991; Lynch 2006, 2010 and references therein) and by comparing 
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DNA sequences from lineages with approximately known divergence 
times (Ochman et al. 1999). Both methods have limitations. The 
former requires knowledge of the mutational target size for the rele- 
vant phenotype and makes assumptions concerning growth and se- 
lection that do not always hold in practice (Sniegowski and Lenski 
1995). The latter assumes that synonymous substitutions are selec- 
tively neutral, requires estimates of generation times in nature, and is 
subject to additional uncertainty when there is recombination or se- 
lection on codon usage and GC-content (Balbi et al. 2009; Sharp et al. 
2010; Touchon et al. 2009). Given these uncertainties, it is not sur- 
prising that the mutation rates estimated for E. coli using these two 
approaches differ by more than an order of magnitude (Drake 1991; 
Ochman et al. 1999). 

More direct measurements of mutation rates are now possible 
using whole-genome sequences of isolates sampled from evolution 
experiments. We have previously applied this approach to one 
population from the long-term evolution experiment with E. coli 
(Barrick et al. 2009; Barrick and Lenski 2009) in which 12 populations 
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have been propagated independently for over 40,000 generations 
(Lenski 2004; Philippe et al. 2007). Here, we resequenced genomes 
of 19 clones that were sampled from 8 populations (Table 1 and sup- 
porting information, Table SI) that did not evolve elevated mutation 
rates early in the experiment (Cooper and Lenski 2000; Sniegowski 
et al. 1997). 

MATERIALS AND METHODS 
Mutation identification 

Genomes were resequenced on the Illumina Genome Analyzer 
platform using one lane of single-end 36-bp reads per genome. 
Candidate point mutations were identified in comparison to the 
ancestral genome of REL606 [GenBank:NC_012967.1] using three 
computational approaches: (i) the SNiPer pipeline (Marchetti et al. 
2010); (ii) the breseq pipeline (Barrick et al. 2009, freely available 
online at http://barricklab.org/breseq); and (iii) an unpublished algo- 
rithm (O. Tenaillon). All candidates were then examined manually to 
account for local misalignment errors relative to the reference genome 
that resulted from gene conversion events, mobile element insertions, 
and large insertions and deletions. Table SI presents the resulting 



consensus list of all synonymous substitutions arranged by population 
and clone. The dN/dS ratios were calculated for each clone according 
to Comeron (1995) as implemented in the libsequence library (Thornton 
2003). 

Synonymous target site calculations 

For whole-genome studies of mutations in bacterial evolution experi- 
ments, we used in-house scripts to calculate the exact number of 
protein-coding sites in the ancestral genome according to gene 
annotations. The effective number of synonymous target sites was 
approximated as one-third of this number, as three mutational 
changes are possible from any ancestral base. This analysis does not 
take into account base composition effects or the small changes in 
genome size during these experiments. The sequence records used for 
other published studies were downloaded from Genbank (Accessions: 
NC_000913.2, AC_000091.1, NC_008095.1, and NC_003197.1). For 
our dataset, we used the Genbank sequence record for E. coli B strain 
REL606 (Accession: NC_012967.1) with updated gene annotations. 
Data files and Perl scripts for performing this analysis are available 
on J.E.B.'s web site (http://barricklab.org/amr). 



Table 1 Description 


of 35 synonymous mutations observed in 19 genomes 


sampled from eight 


evolving populations 


Pom i rati nn 

1 {J KJ U 1 CI LI KJ 1 1 


Genome Position 3 


Gene 


UQjC V 1 CI I uc 


Qpni ipnrpH r^lnnpQ^* 


Ara-1 


- 


- 


- 


20K-A, 20K-B, 20K-C 


Ara-3 


756,799 


to/R 


C->T 


30K-B, 40K 




2,613,609 


purL 


G->A 


30K-B 




2,642,843 


yfiQ 


G^T 


30K-B 




2,983,794 


yggW 


C->T 


40 K 




3,141,566 


ygjE 


C^T 


40K 




3,407,922 


kefB 


C^A 


40 K 




4,111,342 


metL 


C^T 


30K-A 




4,177,963 


hemE 




oUI\-M 




4,107,018 


ECB_03822 


T— >A 


30K-B 40K 




4,313,510 


eptA 


C^T 


40 K 


Ara-5 


157,626 


htrE 


A^T 


40K-B 




307,594 


yahC 


C^T 


40K-A, 40K-B, 40K-C 




3,107,610 


ygiN 


T^A 


40K-A, 40K-B, 40K-C 


Ara-6 


857,058 


moeB 


C^T 


40K-B 




1,352,030 


sapC 


G^T 


40K-B 




2,087,738 


mdtA 


C^A 


40K-A, 40K-B 




2,095,621 


mdtD 


G^A 


40K-A 




3,482,212 


malT 


G^A 


40K-B 


Ara+1 


132,062 


Ipd 


C^T 


40K-A 




239,002 


dnaQ 


A^C 


40K-B 




3,124,208 


yqil 


G^A 


40K-A 




3,308,106 


yhcB 


G^A 


40K-A, 40K-B 




3,409,316 


yheS 


T^G 


40K-A, 40K-B 




3,527,027 


livH 


C^A 


40K-B 




3,910,606 


yifB 


T^G 


40K-B 




4,133,104 


ppc 


G^A 


40K-A, 40K-B 


Ara+2 


1,083,668 


wrbA 


C^T 


40K-A 










40K-B 


Ara+4 


420,328 


cyoB 


A^C 


40K-A, 40K-B 




2,772,320 


lap 


A^C 


40K-A, 40K-B 




3,061,109 


ECB_02854 


G^A 


40K-A 


Ara+5 


122,591 


atnpE 


T^A 


40K-A, 40K-B 




212,865 


IdcC 


T^C 


40K-A, 40K-B 




1,317,194 


trpC 


G^A 


40K-A, 40K-B 




2,009,188 


yoeF 


G^T 


40K-A, 40K-B 




2,251,393 


napA 


G^A 


40K-A, 40K-B 


a Genome position in the ancestral reference strain REL606 [Genl 


3ank:NC_012967.1]. 






b 20K, 30K, and 40K indicate clones sampled after 20,000, 30,000, 


and 40,000 generations, res 


pectively, and labels A, B 


and C indicate different clones from the same 



generation. 
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■ Table 2 Base-substitution rates estimated from evolution experiments with whole-genome data 



Study 


Bacterial Strain 


Clones 


Cumulative 
Generations 


Synonymous 
Sites (bp) a 


Synonymous 
Mutations 


JjL X I u 

(per bp per 
generation)^ 5 


This study 


Escherichia coli B REL606 


19 


300,000 


941,000 


25 (52) c 


8.9 [5.7-13] 


Conrad et a/. (2009) 


Escherichia coli K-12 MG1665 


12 


10,700 


930,000 


5 


50 [16-120] 


Lee and Palsson (2010) 














Kishimoto et a/. (2010) 


Escherichia coli W31 10 


4 


13,850 


945,000 


2 


15 [1.9-55] 


Lind and Andersson 


Salmonella typhimurium LT2 


1 


5000 


990,000 


2 


40 [4.9-150] 


(2008) 












Velicer et al. (2006) 


Myxococcus xanthus DK1 622 


1 


1000 


2,140,000 


1 


47 [1.2-260] 


For these calculations, we used 


only independently evolved end-poi 


nt clones, and 


we pooled data from 


replicate lineages started from the 


same ancestral strain. 



a The effective synonymous target size was calculated from the ancestral genome sequences (see Materials and Methods). 

b The mutation rate /a (per bp per generation) was calculated as the number of observed synonymous mutations divided by the product of the total number of 
generations and the effective number of synonymous target sites. Brackets indicate 95% confidence limits estimated from a binomial distribution. These estimates 
do not take into account base composition or changes in genome size. 

c For comparison with the other datasets, we used only the first clone sequenced at the latest nonmutator time point from each of the eight long-term populations: 
20K-A for Ara-1,40K for Ara-3, and 40K-A for the other six populations (Table 1). There were 25 synonymous mutations in these clones and 52 overall in the 
dataset. A more accurate estimate of and its uncertainty for the long-term lines takes into account the multiple clones sequenced from the same population, the 
pseudo-replication of clones from the same population, the base signatures of the mutations, and changes in genome size. That comprehensive analysis yields 
8.9 [4.0-14] x 1CT 11 per bp per generation (see text). 



Mutation rate estimate 

We used a maximum-likelihood approach to estimate the rates of all 
six possible types of base-pair substitution mutations. This approach 
assumed that synonymous substitutions of a given type accumulated 
as a Poisson process with an expected number equal to the mutation 
rate multiplied by the number of generations elapsed and the total 
number of genomic sites at risk for synonymous substitutions of that 
type. This last factor corrected for regions of the ancestral genome 
where mutations could not be called in an evolved genome due to 
deletions, low coverage, or repetitive sequences, as output by the 
breseq pipeline. 

We corrected for pseudo-replication due to shared evolutionary 
history by averaging the calculated log likelihoods for genomes within 
population blocks. The overall point-mutation rate was then calcu- 
lated by weighting the separately estimated rates for each type of 
mutation by the frequency of corresponding sites in the ancestral 
genome. Tukey's jackknife method was used to estimate overall con- 
fidence limits from the statistics of resampled (delete- 1) datasets that 
each dropped all genomes from a single population. Data files and 
Perl and R scripts for performing this analysis are available on J.E.B.'s 
web site (http://barricklab.org/amr). 

RESULTS AND DISCUSSION 

We analyzed synonymous substitutions because, when examining all 
mutations in the 19 clones, we found dN/dS ratios higher than 1.0 for 
all but one (Table SI). This observation supports pervasive ongoing 
positive selection through 40,000 generations in these experimental 
populations (Barrick et at 2009). Therefore, non-synonymous muta- 
tions are inappropriate for estimating the point-mutation rate. 

From population genetics theory, the expected number of syn- 
onymous mutations in an evolved clone relative to its ancestor is equal 
to the product of the intrinsic base-substitution rate, the number of 
genomic sites at risk for synonymous mutations, and the number of 
elapsed generations (Kimura 1983). The only requisite assumption is 
that most synonymous mutations are selectively neutral. Importantly, 
the expected rate of accumulation of neutral mutations in the lineage 
leading to any particular clone is not affected by selection at other sites 
in the genome, because an asexual lineage simply represents a chain of 
replication events spanning the specified number of generations (Barrick 
et al. 2009; Kimura 1983). 



We observed a total of 52 synonymous substitutions in the 19 
resequenced genomes (Table SI). However, multiple genomes sam- 
pled from the same population are not independent because they 
share some portion of their history; thus, there were only 35 muta- 
tional events (Table 1). We used a resampling procedure to account 
for this pseudo-replication of multiple genomes isolated from a single 
population (see supporting information). The resulting estimate of the 
point-mutation rate is 8.9 x 10~ n per bp per generation (Tukey's 
jackknife 95% confidence interval, 4.0-14 x 10 -11 per bp per gener- 
ation). This estimate corresponds to a total genomic rate of 0.00041 
per generation given the ancestral genome size of 4.6 x 10 6 bp. 

Our inferred point-mutation rate is intermediate to other previous 
estimates based on experimental (Drake 1991) and comparative meth- 
ods (Ochman et al. 1999). These earlier studies yielded estimates of 
5.4 x 10 -10 per bp per generation and 1.5 to 4.5 x 10 -11 per bp per 
generation, respectively. Given the limitations of these approaches as 
noted above, our estimate is probably more accurate. This greater 
accuracy derives from the accumulation of mutational events across 
300,000 generations (summed over the eight replicate populations) 
and over the entire genome, coupled with precise knowledge of the 
number of elapsed generations and the reasonable presumption of 
selective neutrality or near-neutrality for most synonymous muta- 
tions. At the same time, it must also be emphasized that mutation 
rates may differ between strains and species, and they may change 
depending on the environmental conditions experienced by the cells 
(Bjedov et al. 2003). 

To put our estimate into context, we performed a similar analysis 
of all other published whole-genome datasets for bacterial evolution 
experiments with known numbers of generations (Table 2). Taking 
the other experiments together, we found 10 synonymous SNPs in 18 
independently evolved (nonmutator) clones in a total of 30,550 gen- 
erations. These other datasets combined thus provide only ~10% of 
the power, in terms of cumulative generations, as the long-term data- 
set that we have generated and analyzed. As a consequence, the esti- 
mated point-mutation rates for these other experimental systems are 
subject to much greater statistical uncertainty. 

With 35 independent synonymous mutations, we were also able to 
examine the mutational spectrum of base substitutions (Figure 1). 
After correcting for the sequence composition of genomic sites at risk 
for synonymous mutations in the ancestral genome, the observed 
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Figure 1 Expected and observed mutational spectra for synonymous 
point mutations. White and black bars show the expected and 
observed base-pair changes, respectively. The expected values reflect 
the actual base-pair frequencies in the genome and the probability 
that a particular base-pair mutation (e.g., from C:G to T:A) produces 
a synonymous change. 

transition- to-transversion ratio of 1:1.99 did not differ significantly 
from the 1:2 ratio expected if there were a uniform probability of all 
six base-substitution mutations (two-tailed binomial test, P = 0.61). 
However, transitions were highly skewed. Mutations from C:G to T:A 
were 14.5 times as likely as A:T to G:C mutations after accounting for 
sequence composition (two-tailed binomial test, P = 0.00027). This 
finding is consistent with other recent studies that found a strong 
mutational bias toward increased AT composition in bacteria (Balbi 
et al. 2009; Hershberg and Petrov 2010; Hildebrand et al. 2010). This 
bias in mutation pressure explains the pattern of synonymous muta- 
tions seen in our study, and it also implies that selection or gene 
conversion must account for the characteristic GC-contents observed 
in divergent groups of bacteria over much longer evolutionary time- 
scales (Rocha and Feil 2010). 
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